An Interactive Japanese Parser for Machine Translation

نویسنده

  • Hiroshi Maruyama
چکیده

fin this paper, we describe a working system for interactive Japanese syntactic an',dysis. A human user can intervene during parsing to hell) the system to produce a correct parse tree. Human interactions are limited to the very simple task of indicating the modifiee (governor) of a phrase, and thus a non-expert native speaker can use the syst:em. The user is free to give any information in ;my order, or even to provide no information. The :.;ystem is being used as the source language analyzer of a Japanese-to-English machine translation ::;ystem currently under development. 1 I n t r o d u c t i o n I)espite the long history of research and development, perfect or nearly perfect analysis of a fairly ',vide range of natural language sentences is still beyond the state of the art. The users of the existing batch-style machine translation systems are obliged to post-edit the machine-translated text even if it contains errors because of an analysis failure. We haw~ developed an interactive Japanese syntactic analysis system, JAWB (Japanese Analysis WorkBench), for a Japanese-to-English machine translation system. It can produce very reliable .,~yntactie structures with the help of a human user. User interactions are limited to the very simple task of specifying the modifiee (governor) of a phrase, and thus a non-expert native speaker can use the system. The number of user interactions is minimized by using constraint pTopagation (Waltz 1975) to eliminate inconsistent alternatives. One feature of our system not found in previous a t t empts (Kay 1973, ~Ielby 1980, Tomita 1986) is that the user is completely free to give the system any information in any order. He also has the aiternative of providing no information, in this case, the system runs full;," automatically, although the quality of output may be degraded. In the next sectiom we describe the system structure. Then in Section 3 we discuss the interactive dependency analysis, and show a sample session. Section 4 gives the results of evaluation of the system. 2 S y s t e m S t r u c t u r e The system structure of JAWB is shown in Figure 1. Japanese syntax analysis is divided into two parts: morphological analysis and dependency analysis. An input sentence is first segmented into a sequence of linguistic units called bu'nsets'u, which can be roughly translated in English as phr'ase,s. Each bunsetsu, hereafter called a phrase, consists

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Universal Parser Architecture for Knowledge-based Machine Translation

Machine translation should be semanticalty-accurate, linguisticallyprincipled, user-interactive, and extensible to multiple languages and domains. This paper presents the universal parser architecture that strives to meet these objectives. In essence, linguistic knowledge bases (syntactic, semantic, lexical, pragmatic), encoded in theoretically-motivated formalisms such as lexical-functional gr...

متن کامل

Constructing a Practical Constituent Parser from a Japanese Treebank with Function Labels

We present an empirical study on constructing a Japanese constituent parser, which can output function labels to deal with more detailed syntactic information. Japanese syntactic parse trees are usually represented as unlabeled dependency structure between bunsetsu chunks, however, such expression is insufficient to uncover the syntactic information about distinction between complements and adj...

متن کامل

A joint inference of deep case analysis and zero subject generation for Japanese-to-English statistical machine translation

We present a simple joint inference of deep case analysis and zero subject generation for the pre-ordering in Japanese-toEnglish machine translation. The detection of subjects and objects from Japanese sentences is more difficult than that from English, while it is the key process to generate correct English word orders. In addition, subjects are often omitted in Japanese when they are inferabl...

متن کامل

An Empirical Comparison of Parsers in Constraining Reordering for E-J Patent Machine Translation

Machine translation of patent documents is very important from a practical point of view. One of the key technologies for improving machine translation quality is the utilization of syntax. It is difficult to select the appropriate parser for English to Japanese patent machine translation because the effects of each parser on patent translation are not clear. This paper provides an empirical co...

متن کامل

A Corpus and Semantic Parser for Multilingual Natural Language Querying of OpenStreetMap

We present a corpus of 2,380 natural language queries paired with machine readable formulae that can be executed against world wide geographic data of the OpenStreetMap (OSM) database. We use the corpus to learn an accurate semantic parser that builds the basis of a natural language interface to OSM. Furthermore, we use response-based learning on parser feedback to adapt a statistical machine t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1990